Why Large Closest String Instances Are Easy to Solve in Practice

نویسندگان

  • Christina Boucher
  • Kathleen Wilkie
چکیده

We initiate the study of the smoothed complexity of the Closest String problem by proposing a semi-random model of Hamming distance. We restrict interest to the optimization version of the Closest String problem, and give a 2-approximation algorithm, which we refer to as CSP-Greedy, that runs in O(n` + `)-time, where ` is the string length and n is the number of strings. Using smoothed analysis, we prove CSP-Greedy achieves a (1 + 2 )approximation guarantee, where > 0 is a small value. This approximation and runtime guarantee demonstrates that Closest String instances with a relatively large number of input strings are efficiently solved in practice. We also give experimental results demonstrating that CSP-Greedy runs efficiently on instances with a large number of strings. The counter-intuitive fact that large Closest String instances are relatively easy and efficient to solve gives new insight into this well-investigated problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Closer Look at the Closest String and Closest Substring Problem

Let S be a set of k strings over an alphabet Σ; each string has a length between ` and n. The Closest Substring Problem (CSSP) is to find a minimal integer d (and a corresponding string t of length `) such that each string s ∈ S has a substring of length ` with Hamming distance at most d to t. We say t is the closest substring to S. For ` = n, this problem is known as the Closest String Problem...

متن کامل

Combinatorial and Probabilistic Approaches to Motif Recognition

Short substrings of genomic data that are responsible for biological processes, such as gene expression, are referred to as motifs. Motifs with the same function may not entirely match, due to mutation events at a few of the motif positions. Allowing for non-exact occurrences significantly complicates their discovery. Given a number of DNA strings, the motif recognition problem is the task of d...

متن کامل

A Mathematical Model and Grouping Imperialist Competitive Algorithm for Integrated Quay Crane and Yard Truck Scheduling Problem with Non-crossing Constraint

In this research, an integrated approach is presented to simultaneously solve quay crane scheduling and yard truck scheduling problems. A mathematical model was proposed considering the main real-world assumptions such as quay crane non-crossing, precedence constraints and variable berthing times for vessels with the aim of minimizing vessels completion time. Based on the numerical results, thi...

متن کامل

An electromagnetism-like metaheuristic for open-shop problems with no buffer

This paper considers open-shop scheduling with no intermediate buffer to minimize total tardiness. This problem occurs in many production settings, in the plastic molding, chemical, and food processing industries. The paper mathematically formulates the problem by a mixed integer linear program. The problem can be optimally solved by the model. The paper also develops a novel metaheuristic base...

متن کامل

The Bounded Search Tree Algorithm for the Closest String Problem Has Quadratic Smoothed Complexity

Given a set S of n strings, each of length `, and a nonnegative value d, we define a center string as a string of length ` that has Hamming distance at most d from each string in S. The Closest String problem aims to determine whether there exists a center string for a given set of strings S and input parameters n, `, and d. When n is relatively large with respect to ` then the basic majority a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010